{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "5d037743",
      "metadata": {
        "id": "5d037743"
      },
      "source": [
        "# Data Extraction\n",
        "\n",
        "In this tutorial we'll extract data from the MIMIC-IV Waveform Database.\n",
        "\n",
        "Our **objectives** are to:\n",
        "- Extract signals from one segment of a record.\n",
        "- Limit the segment to only the required duration of relevant signals (_i.e._ 10 min of photoplethysmography and blood pressure signals)."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "fe20dd08",
      "metadata": {
        "id": "fe20dd08"
      },
      "source": [
        "<div class=\"alert alert-block alert-warning\">\n",
        "<p><b>Context:</b>\n",
        "    In the <a href=\"https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html\">Data Exploration</a> tutorial we learnt how to identify segments of waveform data which are suitable for a particular research study (i.e. which have the required duration of the required signals). We extracted metadata for such a segment, providing high-level details of what is contained in the segment (e.g. which signals, their sampling frequency, and their duration). Now we will go a step further to extract signals for analysis.</p>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "fd8a0055",
      "metadata": {
        "id": "fd8a0055"
      },
      "source": [
        "---\n",
        "## Setup\n",
        "<div class=\"alert alert-block alert-warning\">\n",
        "<p><b>Resource:</b> These steps are taken from the <a href=\"https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html\">Data Exploration</a> tutorial.</p></div>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f4e37777",
      "metadata": {
        "id": "f4e37777"
      },
      "source": [
        "- Specify the required Python packages"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "id": "10fdf08b",
      "metadata": {
        "id": "10fdf08b"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "from pathlib import Path"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ccce3426",
      "metadata": {
        "id": "ccce3426"
      },
      "source": [
        "- Install and import the WFDB toolbox"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "id": "06c8cc1f",
      "metadata": {
        "id": "06c8cc1f",
        "outputId": "747c5f42-e691-4981-fb53-c6f38007e456",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Requirement already satisfied: wfdb==4.0.0 in /usr/local/lib/python3.7/dist-packages (4.0.0)\n",
            "Requirement already satisfied: SoundFile<0.12.0,>=0.10.0 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (0.10.3.post1)\n",
            "Requirement already satisfied: requests<3.0.0,>=2.8.1 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (2.23.0)\n",
            "Requirement already satisfied: pandas<2.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (1.3.5)\n",
            "Requirement already satisfied: scipy<2.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (1.4.1)\n",
            "Requirement already satisfied: matplotlib<4.0.0,>=3.2.2 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (3.2.2)\n",
            "Requirement already satisfied: numpy<2.0.0,>=1.10.1 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (1.21.6)\n",
            "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (2.8.2)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.4.3)\n",
            "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (3.0.9)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (0.11.0)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (4.1.1)\n",
            "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas<2.0.0,>=1.0.0->wfdb==4.0.0) (2022.1)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.15.0)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2022.6.15)\n",
            "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (1.24.3)\n",
            "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (3.0.4)\n",
            "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2.10)\n",
            "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.7/dist-packages (from SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (1.15.0)\n",
            "Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.0->SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (2.21)\n"
          ]
        }
      ],
      "source": [
        "!pip install wfdb==4.0.0\n",
        "import wfdb"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "524ed046",
      "metadata": {
        "id": "524ed046"
      },
      "source": [
        "- Specify the settings for the MIMIC-IV database"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "id": "2915e121",
      "metadata": {
        "id": "2915e121"
      },
      "outputs": [],
      "source": [
        "# The name of the MIMIC-IV Waveform Database on PhysioNet\n",
        "database_name = 'mimic4wdb/0.1.0'"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "3ea79319",
      "metadata": {
        "id": "3ea79319"
      },
      "source": [
        "- Provide a list of segments which meet the requirements for the study (NB: these are copied from the end of the [Data Exploration Tutorial](https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html))."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "id": "0ee58931",
      "metadata": {
        "id": "0ee58931"
      },
      "outputs": [],
      "source": [
        "segment_names = ['83404654_0005']\n",
        "segment_dirs = ['mimic4wdb/0.1.0/waves/p100/p10020306/83404654']"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0e90110a",
      "metadata": {
        "id": "0e90110a"
      },
      "source": [
        "- Specify a segment from which to extract data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "id": "05fb68d0",
      "metadata": {
        "id": "05fb68d0",
        "outputId": "776068c7-a586-4f0a-a6b3-3154bde5459a",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Specified segment '83404654_0005' in directory: 'mimic4wdb/0.1.0/waves/p100/p10020306/83404654'\n"
          ]
        }
      ],
      "source": [
        "rel_segment_no = 0\n",
        "rel_segment_name = segment_names[rel_segment_no]\n",
        "rel_segment_dir = segment_dirs[rel_segment_no]\n",
        "print(f\"Specified segment '{rel_segment_name}' in directory: '{rel_segment_dir}'\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d00513bd",
      "metadata": {
        "id": "d00513bd"
      },
      "source": [
        "<div class=\"alert alert-block alert-info\">\n",
        "<p><b>Extension:</b> Have a look at the files which make up this record <a href=\"https://physionet.org/content/mimic4wdb/0.1.0/waves/p100/p10020306/\">here</a> (NB: you will need to scroll to the bottom of the page).</p>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "3b2e6adb",
      "metadata": {
        "id": "3b2e6adb"
      },
      "source": [
        "---\n",
        "## Extract data for this segment"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e8810358",
      "metadata": {
        "id": "e8810358"
      },
      "source": [
        "- Use the [`rdrecord`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.rdrecord) function from the WFDB toolbox to read the data for this segment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "id": "8626ebac",
      "metadata": {
        "id": "8626ebac",
        "outputId": "8c0d3d8e-fb01-4a3a-e75f-6502941fad70",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Data loaded from segment: 83404654_0005\n"
          ]
        }
      ],
      "source": [
        "segment_data = wfdb.rdrecord(record_name=rel_segment_name, pn_dir=rel_segment_dir) \n",
        "print(f\"Data loaded from segment: {rel_segment_name}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "5032d6c4",
      "metadata": {
        "id": "5032d6c4"
      },
      "source": [
        "- Look at class type of the object in which the data are stored:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "id": "967fa4ef",
      "metadata": {
        "id": "967fa4ef",
        "outputId": "9e9b7857-dfd6-470c-9722-9a3a196687c3",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Data stored in class of type: <class 'wfdb.io.record.Record'>\n"
          ]
        }
      ],
      "source": [
        "print(f\"Data stored in class of type: {type(segment_data)}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "cf2d5ed7",
      "metadata": {
        "id": "cf2d5ed7"
      },
      "source": [
        "<div class=\"alert alert-block alert-warning\">\n",
        "<p><b>Resource:</b> You can find out more about the class representing single segment WFDB records <a href=\"https://wfdb.readthedocs.io/en/stable/io.html?highlight=class#wfdb.io.Record\">here</a>.</p>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "85a0d656",
      "metadata": {
        "id": "85a0d656"
      },
      "source": [
        "- Find out about the signals which have been extracted"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "id": "6d5416b6",
      "metadata": {
        "id": "6d5416b6",
        "outputId": "e72883c0-0675-4e32-f814-a2bc84e03259",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "This segment contains waveform data for the following 6 signals: ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp']\n",
            "The signals are sampled at a base rate of 62.4725 Hz (and some are sampled at multiples of this)\n",
            "They last for 52.4 minutes\n"
          ]
        }
      ],
      "source": [
        "print(f\"This segment contains waveform data for the following {segment_data.n_sig} signals: {segment_data.sig_name}\")\n",
        "print(f\"The signals are sampled at a base rate of {segment_data.fs} Hz (and some are sampled at multiples of this)\")\n",
        "print(f\"They last for {segment_data.sig_len/(60*segment_data.fs):.1f} minutes\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0d40fab4",
      "metadata": {
        "id": "0d40fab4"
      },
      "source": [
        "<div class=\"alert alert-block alert-info\">\n",
        "<p><b>Question:</b> Can you find out which signals are sampled at multiples of the base sampling frequency by looking at the following contents of the 'segment_data' variable?</p>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "id": "b0903fcf",
      "metadata": {
        "id": "b0903fcf",
        "outputId": "cee5ffd3-aaf5-4e46-9b3e-af94a844a9da",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{'adc_gain': [200.0, 200.0, 200.0, 16.0, 4096.0, 4093.0],\n",
            " 'adc_res': [14, 14, 14, 13, 12, 12],\n",
            " 'adc_zero': [8192, 8192, 8192, 4096, 2048, 2048],\n",
            " 'base_counter': 10219520.0,\n",
            " 'base_date': None,\n",
            " 'base_time': None,\n",
            " 'baseline': [8192, 8192, 8192, 800, 0, 2],\n",
            " 'block_size': [0, 0, 0, 0, 0, 0],\n",
            " 'byte_offset': [None, None, None, None, None, None],\n",
            " 'checksum': [10167, 1300, 56956, 35887, 29987, 21750],\n",
            " 'comments': ['signal 0 (II): channel=0 bandpass=[0.5,35]',\n",
            "              'signal 1 (V): channel=1 bandpass=[0.5,35]',\n",
            "              'signal 2 (aVR): channel=2 bandpass=[0.5,35]'],\n",
            " 'counter_freq': 999.56,\n",
            " 'd_signal': None,\n",
            " 'e_d_signal': None,\n",
            " 'e_p_signal': None,\n",
            " 'file_name': ['83404654_0005e.dat',\n",
            "               '83404654_0005e.dat',\n",
            "               '83404654_0005e.dat',\n",
            "               '83404654_0005p.dat',\n",
            "               '83404654_0005p.dat',\n",
            "               '83404654_0005r.dat'],\n",
            " 'fmt': ['516', '516', '516', '516', '516', '516'],\n",
            " 'fs': 62.4725,\n",
            " 'init_value': [0, 0, 0, 0, 0, 0],\n",
            " 'n_sig': 6,\n",
            " 'p_signal': array([[ 0.00000000e+00, -6.50000000e-02, -5.00000000e-03,\n",
            "                    nan,  5.02929688e-01,  1.56120205e-01],\n",
            "       [ 5.00000000e-03, -4.50000000e-02, -5.00000000e-03,\n",
            "                    nan,  5.02929688e-01,  1.56853164e-01],\n",
            "       [ 1.50000000e-02, -2.50000000e-02,  5.00000000e-03,\n",
            "                    nan,  5.02929688e-01,  1.57097484e-01],\n",
            "       ...,\n",
            "       [-1.50000000e-02,  7.00000000e-02, -4.00000000e-02,\n",
            "         7.25000000e+01,  5.74951172e-01,  3.57683850e-01],\n",
            "       [-1.50000000e-02,  5.50000000e-02, -4.50000000e-02,\n",
            "         7.25000000e+01,  5.70800781e-01,  3.61104324e-01],\n",
            "       [ 0.00000000e+00,  9.00000000e-02, -5.50000000e-02,\n",
            "         7.25000000e+01,  5.62255859e-01,  3.63791840e-01]]),\n",
            " 'record_name': '83404654_0005',\n",
            " 'samps_per_frame': [4, 4, 4, 2, 2, 1],\n",
            " 'sig_len': 196480,\n",
            " 'sig_name': ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp'],\n",
            " 'skew': [None, None, None, None, None, None],\n",
            " 'units': ['mV', 'mV', 'mV', 'mmHg', 'NU', 'Ohm']}\n"
          ]
        }
      ],
      "source": [
        "from pprint import pprint\n",
        "pprint(vars(segment_data))"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        ""
      ],
      "metadata": {
        "id": "gKtupgmahzpt"
      },
      "id": "gKtupgmahzpt",
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.8"
    },
    "toc": {
      "base_numbering": 1,
      "nav_menu": {},
      "number_sections": true,
      "sideBar": true,
      "skip_h1_title": true,
      "title_cell": "Table of Contents",
      "title_sidebar": "Contents",
      "toc_cell": false,
      "toc_position": {},
      "toc_section_display": true,
      "toc_window_display": false
    },
    "colab": {
      "name": "data-extraction.ipynb",
      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}